18 research outputs found

    MgX: Near-Zero Overhead Memory Protection with an Application to Secure DNN Acceleration

    Full text link
    In this paper, we propose MgX, a near-zero overhead memory protection scheme for hardware accelerators. MgX minimizes the performance overhead of off-chip memory encryption and integrity verification by exploiting the application-specific aspect of accelerators. Accelerators tend to explicitly manage data movement between on-chip and off-chip memory, typically at an object granularity that is much larger than cache lines. Exploiting these accelerator-specific characteristics, MgX generates version numbers used in memory encryption and integrity verification only using on-chip state without storing them in memory, and also customizes the granularity of the memory protection to match the granularity used by the accelerator. To demonstrate the applicability of MgX, we present an in-depth study of MgX for deep neural network (DNN) and also describe implementations for H.264 video decoding and genome alignment. Experimental results show that applying MgX has less than 1% performance overhead for both DNN inference and training on state-of-the-art DNN architectures

    GuardNN: Secure DNN Accelerator for Privacy-Preserving Deep Learning

    Full text link
    This paper proposes GuardNN, a secure deep neural network (DNN) accelerator, which provides strong hardware-based protection for user data and model parameters even in an untrusted environment. GuardNN shows that the architecture and protection can be customized for a specific application to provide strong confidentiality and integrity protection with negligible overhead. The design of the GuardNN instruction set reduces the TCB to just the accelerator and enables confidentiality protection without the overhead of integrity protection. GuardNN also introduces a new application-specific memory protection scheme to minimize the overhead of memory encryption and integrity verification. The scheme shows that most of the off-chip meta-data in today's state-of-the-art memory protection can be removed by exploiting the known memory access patterns of a DNN accelerator. GuardNN is implemented as an FPGA prototype, which demonstrates effective protection with less than 2% performance overhead for inference over a variety of modern DNN models

    Neural Network Model Extraction Attacks in Edge Devices by Hearing Architectural Hints

    Full text link
    As neural networks continue their reach into nearly every aspect of software operations, the details of those networks become an increasingly sensitive subject. Even those that deploy neural networks embedded in physical devices may wish to keep the inner working of their designs hidden -- either to protect their intellectual property or as a form of protection from adversarial inputs. The specific problem we address is how, through heavy system stack, given noisy and imperfect memory traces, one might reconstruct the neural network architecture including the set of layers employed, their connectivity, and their respective dimension sizes. Considering both the intra-layer architecture features and the inter-layer temporal association information introduced by the DNN design empirical experience, we draw upon ideas from speech recognition to solve this problem. We show that off-chip memory address traces and PCIe events provide ample information to reconstruct such neural network architectures accurately. We are the first to propose such accurate model extraction techniques and demonstrate an end-to-end attack experimentally in the context of an off-the-shelf Nvidia GPU platform with full system stack. Results show that the proposed techniques achieve a high reverse engineering accuracy and improve the one's ability to conduct targeted adversarial attack with success rate from 14.6\%\sim25.5\% (without network architecture knowledge) to 75.9\% (with extracted network architecture)

    Algorithm-Accelerator Co-Design for High-performance and Secure Deep Learning

    No full text
    176 pagesDeep learning has emerged as a new engine for many of today's artificial intelligence/machine learning systems, leading to several recent breakthroughs in vision and natural language processing tasks.However, as we move into the era of deep learning with billions and even trillions of parameters, meeting the computational and memory requirements to train and serve state-of-the-art models has become extremely challenging. Optimizing the computational cost and memory footprint of deep learning models for better system performance is critical to the widespread deployment of deep learning. Moreover, a massive amount of sensitive and private user data is exposed to the deep learning system during the training or serving process. Therefore, it is essential to investigate potential vulnerabilities in existing deep learning hardware, and then design secure deep learning systems that provide strong privacy guarantees for user data and the models that learn from the data. In this dissertation, we propose to co-design the deep learning algorithms and hardware architectural techniques to improve both the performance and security/privacy of deep learning systems. On high-performance deep learning, we first introduce channel gating neural network (CGNet), which exploits the dynamic sparsity of specific inputs to reduce computation of convolutional neural networks. We also co-develop an ASIC accelerator for CGNet that can turn theoretical FLOP reduction into wall-clock speedup. Secondly, we present Fast Linear Attention with a Single Head (FLASH), a state-of-the-art language model specifically designed for Google's TPU that can achieve transformer-level quality with linear complexity with respect to the sequence length. Through our empirical studies on masked language modeling, auto-regressive language modeling, and fine-tuning for question answering, FLASH achieves at least similar if not better quality compared to the augmented transformer, while being significantly faster (e.g., up to 12 times faster). On the security of deep learning, we study the side-channel vulnerabilities of existing deep learning accelerators. We then introduce a secure accelerator architecture for privacy-preserving deep learning, named GuardNN. GuardNN provides a trusted execution environment (TEE) with specialized protection for deep learning, and achieves a small trusted computing base and low protection overhead at the same time. The FPGA prototype of GuardNN achieves a maximum performance overhead of 2.4\% across four different modern DNNs models for ImageNet

    Synthesis of a Temperature-Sensitive Matrine-Imprinted Polymer and Its Potential Application for the Selective Extraction of Matrine from Radix Sophorae Tonkinensis

    No full text
    A temperature-sensitive matrine-imprinted polymer was prepared in chloroform by free-radical cross-linking copolymerization of methacrylic acid at 60 °C in the presence of ethylene glycol dimethacrylate as the cross-linker, N-isopropyl acrylamide as the temperature-responsive monomer and matrine as the template molecule. Binding experiments and Scatchard analyses revealed that two classes of binding sites were formed on molecular imprinted polymer (MIP) at 50 °C. Additionally, the thermoresponsive MIP was tested for its application as a sorbent material for the selective separation of matrine from Chinese medicinal plant radix Sophorae tonkinensis. It was shown that the thermoresponsive MIP displayed different efficiency in clean-up and enrichments using the SPE protocol at different temperatures
    corecore